在本报告中,我们为Epic-kitchens-100多实体检索(miR)挑战提出了一个基于视频的预处理(VLP)解决方案\ cite {kevin202222222egovlp}。尤其是,我们将最近发布的EGO4D数据集\ cite {grauman2021ego4d}从预处理数据集,预处理目标和开发集中从egecentric vlp中提升。基于上述三个设计,我们开发了一个预验证的视频语言模型,该模型能够将其自我为中心的视频文本表示为mir基准。此外,我们设计了一种自适应多构度最大损失,以有效地微调模型并为可靠的推理配备双重效果技术。我们最好的单个模型在挑战测试集上获得了强劲的性能,其中47.39%的地图和61.44%的NDCG。该代码可在https://github.com/showlab/egovlp上找到。
translated by 谷歌翻译
短期负载预测(STLF)在电力交易市场的运营中起着重要作用。考虑到对数据隐私的日益关注,在最近的研究中,越来越多地采用了联合学习(FL)来培训公用事业公司(UCS)的STLF模型。令人鼓舞的是,在批发市场中,由于发电厂(PPS)直接访问UCS数据并不现实,因此FL绝对是可行的解决方案,可以为PPS获得准确的STLF模型。但是,由于FL的分布性质和UC之间的激烈竞争,缺陷越来越多,导致STLF模型的性能差,表明仅采用FL是不够的。在本文中,我们提出了一种DRL辅助方法,缺陷感知的联合软性参与者 - 批评者(DearFSAC),以稳健地训练PPS的准确的STLF模型,以预测精确的短期公用事业需求。首先。我们仅使用历史负载数据和时间数据设计了基于长期短期内存(LSTM)的STLF模型。此外,考虑到缺陷发生的不确定性,采用了深入的增强学习(DRL)算法来通过减轻缺陷引起的模型退化来协助FL。此外,为了更快的FL训练融合,自动编码器设计用于缩小尺寸和上载模型的质量评估。在模拟中,我们在2019年验证了赫尔辛基UCS的真实数据的方法。结果表明,无论是否发生缺陷,DearFSAC都比所有其他方法都胜过所有其他方法。
translated by 谷歌翻译
可再生能源资源(RERS)已越来越纳入现代电力系统,尤其是在大规模分配网络(DNS)中。在本文中,我们提出了一种深度加强学习(DRL)基础的方法来动态搜索最佳操作点,即最佳功率流(OPF),在具有高摄取RER的DNS中。考虑到由RERS引起的不确定性和电压波动问题,我们将OPF分为多目标优化(MOO)问题。为了解决MOO问题,我们开发了一种利用分发网络图形信息的新型DRL算法。具体而言,我们采用最先进的DRL算法,即深度确定性政策梯度(DDPG),以学习OPF的最佳策略。由于DN中的电力流重新分配是连续的过程,其中节点是在时间和空间视图中自相关和相互关联的,以充分利用DNS的图形信息,我们开发了一种基于多粒的关注的空间 - 时间图卷积用于空间颞曲线图信息提取的网络(MG-ASTGCN),为其顺序DDPG准备。我们在修改IEEE 33,69和118总线径向分布系统(RDS)中验证了基于DRL的基于DRL的方法,并显示了基于DRL的方法优于其他基准算法。我们的实验结果还揭示了MG-ASTGCN可以显着加速DDPG训练过程,并提高DDPG在重新分配OPF电流中的能力。所提出的基于DRL的方法还促进了节点故障存在下的DNS的稳定性,特别是对于大型DNS。
translated by 谷歌翻译
在本文中,我们提出了一个大型详细的3D面部数据集,FACESCAPE和相应的基准,以评估单视图面部3D重建。通过对FACESCAPE数据进行训练,提出了一种新的算法来预测从单个图像输入的精心索引3D面模型。 FACESCAPE DataSet提供18,760个纹理的3D面,从938个科目捕获,每个纹理和每个特定表达式。 3D模型包含孔径级面部几何形状,也被处理为拓扑均匀化。这些精细的3D面部模型可以表示为用于详细几何的粗糙形状和位移图的3D可线模型。利用大规模和高精度的数据集,进一步提出了一种使用深神经网络学习特定于表达式动态细节的新颖算法。学习的关系是从单个图像输入的3D面预测系统的基础。与以前的方法不同,我们的预测3D模型在不同表达式下具有高度详细的几何形状。我们还使用FACESCAPE数据来生成野外和实验室内基准,以评估最近的单视面重建方法。报告并分析了相机姿势和焦距的尺寸,并提供了忠诚和综合评估,并揭示了新的挑战。前所未有的数据集,基准和代码已被释放到公众以进行研究目的。
translated by 谷歌翻译
Time series anomaly detection strives to uncover potential abnormal behaviors and patterns from temporal data, and has fundamental significance in diverse application scenarios. Constructing an effective detection model usually requires adequate training data stored in a centralized manner, however, this requirement sometimes could not be satisfied in realistic scenarios. As a prevailing approach to address the above problem, federated learning has demonstrated its power to cooperate with the distributed data available while protecting the privacy of data providers. However, it is still unclear that how existing time series anomaly detection algorithms perform with decentralized data storage and privacy protection through federated learning. To study this, we conduct a federated time series anomaly detection benchmark, named FedTADBench, which involves five representative time series anomaly detection algorithms and four popular federated learning methods. We would like to answer the following questions: (1)How is the performance of time series anomaly detection algorithms when meeting federated learning? (2) Which federated learning method is the most appropriate one for time series anomaly detection? (3) How do federated time series anomaly detection approaches perform on different partitions of data in clients? Numbers of results as well as corresponding analysis are provided from extensive experiments with various settings. The source code of our benchmark is publicly available at https://github.com/fanxingliu2020/FedTADBench.
translated by 谷歌翻译
Few-shot named entity recognition (NER) targets generalizing to unseen labels and/or domains with few labeled examples. Existing metric learning methods compute token-level similarities between query and support sets, but are not able to fully incorporate label semantics into modeling. To address this issue, we propose a simple method to largely improve metric learning for NER: 1) multiple prompt schemas are designed to enhance label semantics; 2) we propose a novel architecture to effectively combine multiple prompt-based representations. Empirically, our method achieves new state-of-the-art (SOTA) results under 16 of the 18 considered settings, substantially outperforming the previous SOTA by an average of 8.84% and a maximum of 34.51% in relative gains of micro F1. Our code is available at https://github.com/AChen-qaq/ProML.
translated by 谷歌翻译
这项研究使用Tiktok(n = 8,173)来研究最近黑人生活问题运动中抗议范式的短形式视频平台。采用计算机介导的视觉分析,计算机视觉,以确定多媒体内容中的四个视觉抗议(RIOT,COMPRANTATION,COMPROTATION,COMPAINCALE和DEBATE)的存在。描述性统计和t检验的结果表明,在Tiktok上很少发现三个合法化框架 - 暴动,对抗和奇观 - 而辩论框架(赋予边缘化社区)的辩论框架占据了公共领域的主导。但是,尽管三个合法化框架获得了较低的社交媒体可见性,但按照观点,喜欢,分享,追随者和持续时间衡量,但合法化的要素,例如辩论框架,少数群体身份和非正式来源,通常不受Tiktok受众的青睐。 。这项研究得出的结论是,尽管简短的视频平台可能会挑战内容创作者方面的抗议范式,但社交媒体可见性衡量的受众偏爱仍可能与抗议范式相关。
translated by 谷歌翻译
Domain adaptation aims at generalizing a high-performance learner on a target domain via utilizing the knowledge distilled from a source domain which has a different but related data distribution. One solution to domain adaptation is to learn domain invariant feature representations while the learned representations should also be discriminative in prediction. To learn such representations, domain adaptation frameworks usually include a domain invariant representation learning approach to measure and reduce the domain discrepancy, as well as a discriminator for classification. Inspired by Wasserstein GAN, in this paper we propose a novel approach to learn domain invariant feature representations, namely Wasserstein Distance Guided Representation Learning (WD-GRL). WDGRL utilizes a neural network, denoted by the domain critic, to estimate empirical Wasserstein distance between the source and target samples and optimizes the feature extractor network to minimize the estimated Wasserstein distance in an adversarial manner. The theoretical advantages of Wasserstein distance for domain adaptation lie in its gradient property and promising generalization bound. Empirical studies on common sentiment and image classification adaptation datasets demonstrate that our proposed WDGRL outperforms the state-of-the-art domain invariant representation learning approaches.
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译